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ABSTRACT 

Confidence weighting (CW) tends to improve the 
reliability of easy tests; the Coombs-type multiple-response (MR) 
option tends to improve the reliability of hard tests. It was 
hypothesized that, on a test of moderate difficulty, offering both 
the CW and MR response options would improve reliability more than 
either alone. Twenty-four subjects took a 20-item multiple-choice 
test under CW plus MR instructions, MR was used less than CW; 9 
subjects used both options. Coefficient alphas computed on four 
scoring bases showed MR, alone, depressed reliability a little; CW, 
alone, depressed it a lot; and the two combined depressed it even 
more. It was concluded that these two previously successful special 
testing procedures cannot be combined to form an even better one. 
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Abstract 



Confidence wei^ting (Ctf) tends to improve the reliability of easy 
tests; the Coombs-type multiple-response (MR) option tends to improve the 
reliability of hard tests. It was hypothesised that, on a test of moder- 
ate difficulty, offering both the CW and MR response options would improve 
reliability more than either alone. 2k Ss took a 20-item multiple-choice 
test under CW plus MR instructions. MR was used less than CW; 9 Ss used 
both options. Coefficient alphas computed on four scoring bases showed 
MR, alone, depressed reliability a little, CW, alone, depressed it a lot, 
and the two combined depressed it even more. It was concluded that theee 
two previously successful special testing procedures cannot be combined 
to form an even better one. 
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A good idea that failed 



Alfred Da Garvin 
University of Cincinnati 

Opponents of confidence weighting (CW), e.g., Swineford (1930, 19hX) 
and Jacobs (1968), have argued that CW confounds academic achievement with 
irrelevant personality traits and operates to favor the inherently confident 
student over the equally knowledgable but inherently diffident one* Pro- 
ponents of CW, e.g,, Ebel (1965a,b) and Garvin (1969, 1972), have largely 
ignored this criticism, arguing instead that, to a greater or lesser degree, 
CW generally accomplishes what it is Intended to accomplish— improve test 
reliability. Ibese same two rationally orthogonal arguments have also been 
raised regarding the psychologically complementary Coombs-type multiple 
response (MR) option. 
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Garvin and Ralston (1970) considered the uneven success of CW and MR 
across different testing situations and theorised that the relative effic- 
acy of CW and MR was a function of test difficulty: CW would "work” with 

easy tests by permitting extra knowledge to be displayed (and rewarded); 

MR would work with hard tests by permitting partial knowledge to be dis- 
played (and rewarded). In their limited empirical test, this theory was 
supported: On a relatively hard course pretest taken under MR instructions 

by one group and CW instructions by an equivalent group, MR scores were 
more reliable than the corresponding conventional scores while CW scores 
were less reliable. They concluded that either CW or MR (but not both) 
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vould "work" in any given testing situation, depending on test difficulty. 
They implicitly dismissed the possibility that neither would work* 

The present study was designed to cast light on all of the foregoing 
propositions. In a typical group a typical test will be easy for seme and 
hard for others* Why not provide a wide range of response options so as to 
elicit the extra knowledge of the better students and the partial knowledge 
of the poorer ones? Surely, the more opportunity to display knowledge, the 
more reliable the test* 

Method 



Subjects 

The Ss were the 2k graduate education majors enrolled in the author's 
course in Measurement and Evaluation, 

Test 



Ihe test involved was a midterm exam comprising 20 four -choice multiple 
choice items on basic test construction principles, 

Procedxire 

Each S was permitted to answer each item in any one of three ways, 
according to his confidence in his answer* The response options available 
and their corresponding score contingencies were as follows: 

Best answer (BA) Simply indicate the one best answer* 

1 point if right; 0 if wrong* 
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Confidence weighting (CW) Circle your best arawer selection* 

2 points if right; -2 if wrong. 

Multiple response (MR) Indicate your first choice and second choice. 

^ point if either is right; 0 if both wrong. 

The procedures for indicating each response option and scoring such 
responses were carefally explained and illustrated through examples. A 
generous time limit was allowed for the test. 

Analysis of data 

All responses were scored four different ways: 

BA CW and BA responses and the first choices of MR responses were 
all scored on a BA (1 or O) basis. 

CW CW responses were scored as 2 or -2; BA responses and the first 
choices of MR responses were scored as 1 or 0, 

MR GW and BA responses were scored as 1 or 0; MR responses were 
scored as ^ if either choice was right, otherwise as 0. 

CW+MR CW responses were scored as 2 or -2; BA responses were scored 
as 1 or 0; and MR responses were scored as | (if either was 
right) or 0. 

A coefficient alpha reliability was computed for each of these four 
sets of scores. Rank-order correlations were computed between certain 
variables of interest, as explained more fully in the next section. 
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Results 

Most of the Ss "played the game"; the distribution of Ss by response 
options exercised was: 



BA only 


2 


BA+MR 


2 


BA+CW 


11 


BA+CW+MR 


_9 
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Every item of the test received some special responses. Nine of the 20 items 
received CW responses and the other 11 received both Of and MR responses. 

Most of the Ss played the game quite intelligently. In general, Ss 
with the hipest BA scores weighted the most items and the items that they 
weighted were the easiest ones; Ss with the lowest BA scores gave the most 
second choices and they did so on the hardest items. Rank-order correl- 
ations among these variables were all in the direction implied and were 
significant at the .0^ level. 

The distribution of BA scores was symmetrical and platykurtic. The 
mean was 12.3; the standard deviation was 2.8. The coefficient alphas 
for the four methods of scoring were: 

BA only .^36 
CW .U08 

MR .U71 

GW+I® .372 
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Discussion 

The argument that most students are either inherently confident or 
inherently diffident in test- taking received little support here. When 
given the chance to respond in either, neither, or both of these ways in 
a single test, 9 out of 2h Sa did some of each. Risk-taking behavior is 
better explained as a rational reaction to perceived item difficulty than 
as an inherent personality trait. 

The difficulty level of this test (on a BA-score basis) was so close 
to the theoretical ideal for maximom discrimination that it should be said 
to have been of moderate difficulty for this group. We might have expected 
relatively small but equal proportions of responses to have been given in 
the CW and the MR modes. Further, we might have expected that neither CW 
nor MR, alone, nor their combination would have much effect on reliability. 

Ihe actual results are interesting but dismaying. The MR option was 
used a little and it depressed reliability a little; the CW option was used 
a lot and it depressed reliability a lot. Worse still, the individual 
effects of these two options in depressing reliability seem to be additive 
when they are combined. 

It remains to be seen whether a much easier test would show CW to be 
effective and whether a much harder test would show MR to be effective, as 
hypothesised. Clearly, replications of this experiment are necessary. In 
the meantime, we must give serious attention to the present evidence that 
two previously successful testing procedures do not combine to form an even 
better one. 
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